Skip to content

Conversation

jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Sep 3, 2025

This patch optimizes PopCount value transforms using KnownBits information.
Following are the results of the micro-benchmark included with the patch

System: 13th Gen Intel(R) Core(TM) i3-1315U

Baseline:
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  215460.670          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  294014.826          ops/s

Withopt:
Benchmark                                      Mode  Cnt       Score   Error  Units
PopCountValueTransform.LogicFoldingKerenLong  thrpt    2  389978.082          ops/s
PopCountValueTransform.LogicFoldingKerenlInt  thrpt    2  417261.583          ops/s

Kindly review and share your feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365205: C2: Optimize popcount value computation using knownbits (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27075/head:pull/27075
$ git checkout pull/27075

Update a local copy of the PR:
$ git checkout pull/27075
$ git pull https://git.openjdk.org/jdk.git pull/27075/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27075

View PR using the GUI difftool:
$ git pr show -t 27075

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27075.diff

Using Webrev

Link to Webrev Comment

@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 3, 2025

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 3, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Sep 3, 2025

@jatin-bhateja
The hotspot-compiler label was successfully added.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review September 4, 2025 05:42
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 4, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 4, 2025

@SirYwell
Copy link
Member

SirYwell commented Sep 4, 2025

The change looks good, but I wonder:

  • if it makes sense to have some kind of IR tests (i.e., it's folded away when unneeded, when the input is a constant, ...)?
  • whether the explanation could be simplified: Assuming a correct implementation of the KnownBits canonicalization, we can argue
    • _zeroes has the bits set that are known to be always 0. So BitsPer<Type> - popCount(x) gives you an upper limit of how many bits might be 1. And BitsPer<Type> - popCount(_zeroes) is equivalent to popCount(~_zeroes).
    • _ones has the bits set that are known to be always 1. Trivially, popCount(_ones) is a valid lower bound.
    • The rest repeats how adjust_bits_from_unsigned_bounds works, but that's not specific to the popcount nodes.

@jatin-bhateja
Copy link
Member Author

Hi @TobiHartmann , @SirYwell , @eme64 , can you kindly verify the changes in the latest patch?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement @jatin-bhateja , thanks for working on it :)

Comment on lines 35 to 79
@Benchmark
public int StockKernelInt() {
int res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
int constrained_i = i & 0xFFFF;
res += constrained_i;
}
return res;
}

@Benchmark
public int LogicFoldingKerenlInt() {
int res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
int constrained_i = i & 0xFFFF;
if (Integer.bitCount(constrained_i) > 16) {
throw new AssertionError("Uncommon trap");
}
res += constrained_i;
}
return res;
}

@Benchmark
public long StockKernelLong() {
long res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
long constrained_i = i & 0xFFFFFFL;
res += constrained_i;
}
return res;
}

@Benchmark
public long LogicFoldingKerenLong() {
long res = 0;
for (int i = lower_bound; i < upper_bound; i++) {
long constrained_i = i & 0xFFFFFFL;
if (Long.bitCount(constrained_i) > 24) {
throw new AssertionError("Uncommon trap");
}
res += constrained_i;
}
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the stock kernels are there to show performance if there is no op, the folding kernels you hope have the same performance. It would be nice to have one where the bitCount does not fold away, just to keep that comparison :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, on a second thought, since any benchmarks compare the performance of kernels with and without optimization it's better to do away with the stock variants and only retain folding kernels.

@eme64
Copy link
Contributor

eme64 commented Sep 18, 2025

@jatin-bhateja I'm going to be out of the office for about 3 weeks, so feel free to ask others for reviews!

@openjdk
Copy link

openjdk bot commented Sep 18, 2025

@jatin-bhateja The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@jatin-bhateja
Copy link
Member Author

Hi @eme64 , @chhagedorn , @SirYwell , let me know if its good to land now.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm about to board my plane for a 3-week vacation, I'm leaving a last little note for the reviewers.

I think this is a really nice addition, so thanks for doing it @jatin-bhateja 😊 . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as JDK-8367341.

Please make sure you test it, and make sure the random values generated with the Generators in test/hotspot/jtreg/compiler/intrinsics/TestPopCountValueTransforms.java make sense. Currently, there is for example a 32 bit range for a 64 bit long value, which is not correct, I think.

By default, my recommendation is to not constrain the Generators ranges, unless there is a really good reason. Generators are already built to produce values close to zero at an over-proportional rate. But by not restricting we may at some point also hit cases that we did not anticipate, and catch bugs that way.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Sep 19, 2025

As I'm about to board my plane for a 3-week vacation, I'm leaving a last little note for the reviewers.

I think this is a really nice addition, so thanks for doing it @jatin-bhateja 😊 . Though it will only reach its full potential once we implement more "basic" KnownBits optimizations such as JDK-8367341.

Correct, currently KnownBits information is constrained as they are generated for limited value ranges, as discussed in
#27075 (comment)

// We use the KnownBits information from the integer types to derive how many one bits
// we have at least and at most.
// From the definition of KnownBits, we know:
// zeros: Indicates which bits must be 0: ones[i] =1 -> t[i]=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this, is ones[i] mixed up with zeros[i]? I.e., t[i]=0 if zeros[i]=1

Copy link
Member Author

@jatin-bhateja jatin-bhateja Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jatin-bhateja
Copy link
Member Author

Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in.

Copy link
Member

@SirYwell SirYwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, although I'm not exactly sure about the semantics of widen and whether Type::WidenMax is the right choice here. Someone else has to look at that.

Thanks for the work!

Copy link
Member

@liach liach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion, we might do this as a later RFE.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Sep 29, 2025

Looks good to me now, although I'm not exactly sure about the semantics of widen and whether Type::WidenMax is the right choice here. Someone else has to look at that.

Thanks for the work!

Hi @TobiHartmann @chhagedorn , can you kindly run this through your testing and share if its good to land in.

@TobiHartmann
Copy link
Member

I submitted testing for this and will report back once it finished.

return Type::TOP;
}
KnownBits<juint> bits = t->isa_int()->_bits;
return TypeInt::make(population_count(bits._ones), population_count(~bits._zeros), Type::WidenMax);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The widen of the output should be the same as the widen of the input, not WidenMax here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @merykitty, widen is mainly used for optimistic data flow analysis pass like CCP where type analysis begins with TOP and progressively grows the value range till convergence / fixed point.
it's good to preserve the widen of input to delay eager convergence.

@TobiHartmann
Copy link
Member

Testing all passed. I'll pass the review to someone else.

@eirbjo
Copy link
Contributor

eirbjo commented Oct 6, 2025

Is the core-libs label appropriate for this PR? Looks hotspot specific?

@SirYwell
Copy link
Member

SirYwell commented Oct 6, 2025

Is the core-libs label appropriate for this PR? Looks hotspot specific?

That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific.

@eirbjo
Copy link
Contributor

eirbjo commented Oct 6, 2025

That label was added automatically, closely after https://mail.openjdk.org/pipermail/jdk-dev/2025-September/010486.html. Not sure why, but the change is definitely hotspot specific.

Right, makes sense. I'll go ahead and remove that label.

/label remove core-libs

@openjdk
Copy link

openjdk bot commented Oct 6, 2025

@eirbjo
The core-libs label was successfully removed.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Oct 6, 2025

Hi @merykitty , Can you kindly be the second reviewer?

@liach
Copy link
Member

liach commented Oct 6, 2025

FYI this patch was marked core-libs because of the benchmark addition in java/lang. I wonder if it belongs to vm/compiler instead.

Copy link
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

if (t == Type::TOP) {
return Type::TOP;
}
const TypeInt* tint = t->isa_int();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be is_int. isa_int is fine, but in cases of unexpected types, it will SIGSEGV instead of throwing an assertion, which is more difficult to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

8 participants